skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Niyongabo_Rubungo, Andre"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract The prediction of crystal properties plays a crucial role in materials science and applications. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). However, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. In this paper, we develop and make public a benchmark dataset (TextEdge) that contains crystal text descriptions with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8% on predicting band gap, 3% on classifying whether the band gap is direct or indirect, and 65% on predicting unit cell volume, and yields comparable performance on predicting formation energy per atom, energy per atom, and energy above hull. LLM-Prop also outperforms the fine-tuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average. Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction. 
    more » « less